Algebraic Structure of Some Learning Systems
نویسنده
چکیده
Our goal is to define some general properties of the representation languages — i.e. lattice structures, distributive lattice structures, cylindric algebras...—, on which generalization algorithms could be related. This paper introduces a formal framework providing a clear description of the version space. It is of great theoretical interest since it makes the generalization and the comparison of many machines learning algorithms possible. Moreover, it could lead to reconsider some aspects of the classical description of the version space. In this paper, we shall restrict the scope of investigation to lattices, i.e., to cases where there exists one and only one generalization for any set of examples. More precisely we take into account a particular kind of lattices: Brouwerian lattices. It is shown that a particularly interesting case covered by this restriction is the product of hierarchical posets which is equivalent to the conjonction of tree structured or linearly ordered attributes. 1 . INTRODUCTION During the past, very little attention has been paid to the mathematical structure of objects involved in machine learning processes. As example, the notion of linearly ordered or tree structured attribute which is currently used in Machine Learning is not related to well defined mathematical properties. In fact, most of the time, machine learning mechanisms rely on the introduction of an ordering relation which is related to the notion of generality or to the notion of subsumption. This ordering relation restricts by itself the range of mathematical frameworks which can structure the representation language. The traditional artificial intelligence approach defines knowledge representation languages before defining the properties of those languages. In case of machine learning, there is no precise definition of the language properties which are required by the learning algorithms. It leads to some confusion since the limitations of representation languages and the limitations of algorithms which manipulate expressions are not clearly distinguished. For instance, in case of ID3-like induction systems, it appears that the attribute-value representation is a particular case of some more general representation language which could easily extend those systems. However, the classical description of the algorithm does not make the extensions to more general languages obvious since it is limited to representation languages based on attribute-value structural descriptions. The goal is here to define some general properties of the representation languages — i.e. lattice structures or distributive lattice structures, ... —, on which generalization algorithms could be based. It is of great theoretical interest, since it makes the generalization and the comparison of many machine learning algorithms possible. Nevertheless, it should not be confused with learnability, either Gold [Gold 67], Valiant [Valiant 84] or others learning paradigms. The present goal is not to define general limitations of learning mechanisms but to relate machine learning algorithms to the mathematical properties of the manipulated objects. On the other hand, there has been some attempts to define a general learning framework using the notion of version space, but recent studies show that, in practice, this framework is not usable. (See [Haussler 88] or [Hirsh 92]) We shall restrict our presentation to lattices. It means that each set of descriptions has one and only one least general generalization. This restriction covers many applications in machine learning, for instance, it covers all the ID3-like systems [Quinlan 1983, 1986], but it does not cover the case where matching is multiple, i.e., where a first order logic is required. (Cf. [Muggleton and Feng 1990]) In those cases, the notion of cylindric algebra has to be introduced. It could be seen as a generalization of the present work, but the principles on which it relies are the similar to those presented here. 2 . INTRODUCTION TO VERSION SPACE Introduced by T. Mitchell [Mitchell 82], the version space has been seen as a general framework in which every machine learning algorithm could be described as a search algorithm. In this framework, similarity-based learning — SBL — could be summarized by the following points (Cf. [Mitchell 82]). "Given: a language in which to describe instances a language in which to describe generalizations a matching predicate that matches generalizations to instances a set of positive and negative training instances of a target generalization to be learned Determine: generalizations within the provided language that are consistent with the presented training instances (i.e. plausible descriptions of the target generalization)" It is assumed that the space of generalization is ordered with the relation "is more general than", noted ≤g, which is defined by: G1 ≤g G2 if and only if {i∈I | M(G1,i)} ⊇ {i∈I | M(G2,i)} where M(G,i) means that the generalization G match the instance i, M being the matching predicate. The set of all consistent hypothesis is defined with two sets, the set of maximally specific generalizations, noted S-set, and the set of maximally general generalizations, noted Gset. Adding positive and negative instances of the target concept lead to increase the S-set and to decrease the G-set. The algorithm stops when the S-set equals the G-set or when some inconsistency arises. Practical [Bundy & Al. 85] and theoretical studies showed that this framework was actually not usable. The first reason is that the number of examples required to ensure the convergence of the algorithm can be exponential with the problem size. The second is that the size of the G-set can also become exponential, even in some trivial cases like the one given in [Haussler 88]. For the sake of clarity, let us recall the examples given in [Haussler 88]. On one hand, Haussler says that if the instance space X is defined by the boolean attributes A1, A2, ...An, and if the target concept h is supposed to be A1 = true, we need more than 2n-2 positive examples and more than 2n-2 negative examples, even if the hypothesis space is restricted to pure conjonctive concepts. The reason is that there are 2n-2 positive examples such that A1 = true and A2 = true, so if we want distinguish A1 = true from A1 = true & A2 = true, we need more than 2n-2 positive examples. The same argument can be applied to negative examples. Therefore, the number of examples needed is exponential with the number of attributes. On the other hand, let us suppose that X is always defined by the boolean attributes A1, A2, ...An and that there is one positive example Q (true, true, ..., true) and n/2 negative examples: (false, false, true, true, true, ..., true, true, true) (true, true, false, false, true, ..., true, true, true) . . . . . . (true, true, true, true, true, ..., true, false, false) Assume that the target concept h is a pure conjonctive hypothesis consistent with the positive example Q, then (1) h is of the form Ai1 = true & Ai2 = true & ... & Aij = true, for some {A1, A2, ..., An} ⊇ {Ai1, Ai2, ..., Aik} and (2) h must contain the following atoms: either the atom A1 = true or the atom A2 = true to exclude the first counter example either the atom A3 = true or the atom A4 = true to exclude the second counter example . . . . . . either the atom An-1 = true or the atom An = true to exclude the last counter example Therefore, it is easy to show that the maximally general concept which meet both (1) and (2) is a disjunctive normal form containing at least 2n/2 conjonctions. It follows that the size of the G-set is exponential with the number of counter-examples. Many solutions have been proposed to solve the difficulties encountered. For instance one propose to modify the learning bias (Cf. [Utgoff 88]). Another proposed to consider only a list of negative instances to represent the G-set [Hirsh 92]. It is also possible to provide new ad-hoc representations of the version space [Nicolas 93] or to decompose the generalization language on a product of attributes [Carpinetto 92] etc. It appears that all those solutions are restricted to particular cases, for instance to the cases where the Sset is conjunctive or/and where the generalization space is a lattice — i.e. each set of instances has one and only one generalization. We claim that the learning problem being stated as above, it is possible to get a better formalization than the one proposed by the classical version space. It is just necessary to add the hypothesis that the instance language is ordered by the generality relation. Then, the S-set and the G-set are related to the instance language and not to the generalization language which is just used to compute an efficient generalization. In this framework, we do not have to make the S-set and the G-set converge since it is only necessary to have a maximally general generalization consistent with the instances, i.e. a G-set. To clarify our ideas, let us restrict to the case where the description language is a Brouwerian lattice (Cf. Appendix) and let us formalize the notion of S-set and G-set in this context. 3 . FORMALIZATION OF THE LEARNING PROBLEM It is possible to formulate the learning problem as it was introduced by T. Mitchell (see above) using elementary lattice theory notions (see appendix). To do so, let us suppose that given a set E = {e1, e2,..., en} of positive instances and a set CE = {ce1, ce2,..., cem} of negative instances, a concept C has to be learned. We shall assume that the positive instances, ei, and the negative instances, cej, are described as points of a representation space R which is ordered by the generality relation ≤g. We shall also assume that R is a Brouwerian lattice which means (Cf. Appendix) that for each pair {a, b} there exists a least upper bound of a and b, noted (a ∨ b), a greater lower bound of a and b, noted (a ∧ b) and a pseudo-complement of a related to b,
منابع مشابه
AN ALGEBRAIC STRUCTURE FOR INTUITIONISTIC FUZZY LOGIC
In this paper we extend the notion of degrees of membership and non-membership of intuitionistic fuzzy sets to lattices and introduce a residuated lattice with appropriate operations to serve as semantics of intuitionistic fuzzy logic. It would be a step forward to find an algebraic counterpart for intuitionistic fuzzy logic. We give the main properties of the operations defined and prove som...
متن کاملSOME HYPER K-ALGEBRAIC STRUCTURES INDUCED BY MAX-MIN GENERAL FUZZY AUTOMATA
We present some connections between the max-min general fuzzy automaton theory and the hyper structure theory. First, we introduce a hyper BCK-algebra induced by a max-min general fuzzy automaton. Then, we study the properties of this hyper BCK-algebra. Particularly, some theorems and results for hyper BCK-algebra are proved. For example, it is shown that this structure consists of different ty...
متن کاملRough ideals based on ideal determined varieties
The paper is devoted to concern a relationship between rough set theory and universal algebra. Notions of lower and upper rough approximations on an algebraic structure induced by an ideal are introduced and some of their properties are studied. Also, notions of rough subalgebras and rough ideals with respect to an ideal of an algebraic structure, which is an extended notion of subalgebras and ...
متن کاملHvMV-ALGEBRAS II
In this paper, we continue our study on HvMV-algebras. The quotient structure of an HvMV-algebra by a suitable types of congruences is studied and some properties and related results are given. Some homomorphism theorems are given, as well. Also, the fundamental HvMV-algebra and the direct product of a family of HvMV-algebras are investigated and some related results are obtained.
متن کاملAn Introduction to Inference and Learning in Bayesian Networks
Bayesian networks (BNs) are modern tools for modeling phenomena in dynamic and static systems and are used in different subjects such as disease diagnosis, weather forecasting, decision making and clustering. A BN is a graphical-probabilistic model which represents causal relations among random variables and consists of a directed acyclic graph and a set of conditional probabilities. Structure...
متن کاملALGEBRAIC GENERATIONS OF SOME FUZZY POWERSET OPERATORS
In this paper, let $L$ be a completeresiduated lattice, and let {bf Set} denote the category of setsand mappings, $LF$-{bf Pos} denote the category of $LF$-posets and$LF$-monotone mappings, and $LF$-{bf CSLat}$(sqcup)$, $LF$-{bfCSLat}$(sqcap)$ denote the category of $LF$-completelattices and $LF$-join-preserving mappings and the category of$LF$-complete lattices and $LF$-meet-preserving mapping...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993